† Corresponding author. E-mail:
Project supported by the National Natural Science Foundation of China (Grant No. 61872204), the Natural Science Fund of Heilongjiang Province, China (Grant No. F2017029), the Scientific Research Project of Heilongjiang Provincial Universities, China (Grant No. 135109236), and the Graduate Research Project, China (Grant No. YJSCX2019042).
Traditional compressed sensing algorithm is used to reconstruct images by iteratively optimizing a small number of measured values. The computation is complex and the reconstruction time is long. The deep learning-based compressed sensing algorithm can greatly shorten the reconstruction time, but the algorithm emphasis is placed on reconstructing the network part mostly. The random measurement matrix cannot measure the image features well, which leads the reconstructed image quality to be improved limitedly. Two kinds of networks are proposed for solving this problem. The first one is ReconNet's improved network IReconNet, which replaces the traditional linear random measurement matrix with an adaptive nonlinear measurement network. The reconstruction quality and anti-noise performance are greatly improved. Because the measured values extracted by the measurement network also retain the characteristics of image spatial information, the image is reconstructed by bilinear interpolation algorithm (Bilinear) and dilate convolution. Therefore a second network USDCNN is proposed. On the BSD500 dataset, the sampling rates are 0.25, 0.10, 0.04, and 0.01, the average peak signal-noise ratio (PSNR) of USDCNN is 1.62 dB, 1.31 dB, 1.47 dB, and 1.95 dB higher than that of MSRNet. Experiments show the average reconstruction time of USDCNN is 0.2705 s, 0.3671 s, 0.3602 s, and 0.3929 s faster than that of ReconNet. Moreover, there is also a great advantage in anti-noise performance.
In order to recover the analog signal without distortion, the conventional Nyquist sampling frequency should be no less than twice the highest frequency in the analog signal spectrum. And the large number of data is not conducive to storage and transmission. In 2006, the compressed sensing theory[1–3] proposed by Candes, et al., which can sample the signal at a much lower sampling frequency than Nyquist and fully reconstruct the original signal with high probability.
The traditional compressed sensing reconstruction method is based on sparse prior knowledge, which essentially solves an underdetermined system of equations (y = Φx). How to find the optimal solution from this set of underdetermined equations is the key to reconstruction.[4] The main research directions are as follows: (i) Sparse representation. Look for a sparse basis Ψ, on which the sparse matrix projected by the signal x. And the sparse matrix has the fewest number of non-zero elements. (ii) Measurement matrix. Find the measurement matrix Ψ, which is unrelated to the sparse basis Ψ, to make the measured value y obtained by dimension reduction retain enough information about the original signal x. (iii) Refactor method. Find a reconstruction method with low reconstruction time and good robustness while the reconstruction quality is ensured. In the reconstruction method, the convex relaxation method,[5,6] the greedy matching pursuit method,[7–10] and the Bayesian method[11–13] are usually used to solve the corresponding sparse coding problem. However, the real image does not accurately satisfy the sparsity in the transform domain, the image reconstructed by the sparse modeling reconstruction algorithm is not high in quality. Moreover it is difficult to realize real-time performance by multiple iterations, which restricts the the development of compression sensing technology.
Some real images do not accurately satisfy the sparsity in the transform domain. By deep learning method, the measured values can be extracted and reconstructed by pure data-driven method, which relaxes the assumptions of the image signal sparsity. Convolutional neural networks and heap noise reduction self-coding networks have the ability to extract high-quality features of images, which can significantly improve the quality of image reconstruction by training and learning. In Ref. [14], Mousavi et al. reconstructed images by using the heap noise reduction self-coding model (SDA) and designed two kinds of networks. One is to use a linear reconstruction method and the input is a measured value, and the other is to use a nonlinear method from end to the end and the input is the original image. In Ref. [15], a ReconNet was proposed, by which the reconstructed image was obtained from the linear mapping network, and the high quality reconstructed image was also found by two SRCNN models.[16] The reconstruction quality is better than that of the SDA. In Ref. [17], a DR2-Net was proposed, which was composed of a linear mapping network and four residual network blocks.[18] The reconstruction quality was improved. However, the reconstruction time is longer. Lian et al.[19] proposed an MSRNet, which was comprised of a linear mapping network and a multi-scale residual network. The reconstruction quality is better than that of the DR2-Net, but the reconstruction time is still longer than that of the ReconNet.
In this paper, we propose an IReconNet model. Experiments show the reconstruction quality is better than those of the ReconNet, the DR2-Net, and MSRNet by using the adaptive nonlinear measurement network. However the measured values obtained by the measurement network still retain the image spatial information. Another model, i.e., the USDCNN, is proposed by improving the reconstruction network. The reconstruction quality of the USDCNN is better than that obtained by using IReconNet with the fully connected layer and the reconstruction time is much shorter also.
The traditional compressed sensing mainly includes three parts: data sampling, data sparse representation, data reconstruction. When the image is n = w × h, take the dimension of m × n (m ≪ n) measurement matrix to sample the image x, which has been vectorized into n × 1 dimension,
Equation (
The traditional optimization method requires multiple iterations of Eq. (
It can be seen from Eq. (
Due to the many and complex features of large images, there are many network layers required for reconstructing the entire image directly through deep learning, in which the reconstruction time is long and the input image size is limited. Dividing the image into blocks and then compressing it can use less network layers to reconstruct high-quality image blocks, and then stitch them into large images without any restrictions on the image size.
In this article, the image is divided into blocks (33 × 33) and four different sampling rates (MR = 0.25, 0.1, 0.04, 0.01) are used. In the IReconNet and USDCNN, the same measurement network is used, the high quality features is measured through the convolutional neural networks. The IReconNet’s reconstruction network the same as the ReconNet's, reconstructs the image through the fully connected layer to obtain the approximate solution
In all of the ReconNet, DR2-Net, and MSRNet, the random Gaussian measurement matrix Φ is used to reduce the dimension of the image block xi to obtain the measured value. The measurement network first increases the dimension through the convolutional layer to obtain sufficient features, and then reduces the dimension to obtain the required features. Using the BatchNorm[21] to prevent over fitting and speed up training, the Relu activation function[22] improves network expression, and the last layer of the measurement network uses the Sigmoid activation function to map values to 0–1. The original image block xi is used as the input of the measurement network Fs(⋅), and the convolution layer weight Ws and the measured value yi are trained by the Adam[23] method, it can be expressed as
The measurement network in this paper consists of four convolutional layers without using a pooling layer. Figure
Since the algorithm based on the deep learning method such as the MSRNet changes the matrix of 33 × 33 into a vector which is then measured, the measured value does not reveal the image outline. In order to visually show the measurement effect of the random Gaussian measurement matrix on the image, a random Gaussian measurement matrix is used to measure each column of the test image and then spliced. The contrast effect of MR = 0.25 is shown in Fig.
It can be seen from Fig.
(i) Linear generation network
The IReconNet is reconstructed for the image through the fully connected layer. According to the compressed sensing formula yi = Φxi, the measured value yi is obtained by the linear mapping of the fully connected layer, and the approximate solution
(ii) SRCNN model
As shown in Fig.
(I) Dilate convolution
Receptive field is very important in image reconstruction. Large receptive fields can obtain more image features and the reconstruction quality will be higher. In convolutional neural networks, large convolution kernels and pooling layers are generally used to increase the receptive field, and large convolution kernels increase the computational complexity. Although the pooling layer does not increase the computational complexity, it will lose some information, affecting the quality of reconstruction. The dilate convolution is to insert a zero value in the convolution kernel to expand, and there is a large receptive field without adding additional cost, and at the same time, the size of the characteristic map of the output can be kept unchanged. For example, using the dilate factor d = 3 to dilate the convolution kernel of 3 × 3, we obtain a (2d + 1) × (2d + 1), which is a convolution kernel of 7 × 7, in which 9 positions are not zero, the rest are zero. The convolution kernel is shown in Fig.
(II) Up sampling
Using the up sampling method, in this paper we test the nearest neighbor interpolation (Nearest), Bilinear, PixelShuffle,[24] transposed convolution.[25] The Nearest speed is fast but the reconstructed image quality is not high, the Bilinear reconstruction image quality is high but the speed is slower than the Nearest. The PixelShuffle is a matrix of r2C × H × W, which is transformed into a matrix of C × rH × rW by the Sub-pixel operation. The reconstructed image is high in quality and fast but not all sampling rates are fast, however, these sampling rates are still available. By transposed convolution, the reconstructed image will have checkerboard effect and noise. After comprehensive consideration, in the up sampling method in this paper Bilinear is chosen and used.
The USDCNN network is shown in Fig.
In this paper, we use a data set from DR2-Net for a total of 91 images. For the sake of fairness, the RGB image channel is converted to the YCrCb image channel which is the same as the image channel of ReconNet, MSRNet, and DR2-Net. Select the image Y channel and first scale (0.75, 1, 1.5), then we will divide the image into blocks (33 × 33) and take the step size 14 to obtain a total of 87104 images which are used as the training set in this paper. Using four different sampling rates (MR = 0.25, 0.1, 0.04, 0.01), the measured value matrix sizes after 33 × 33 image block measurement are [16, 17], [9, 12], [4, 11], and [2, 5], respectively. Although the measurement network improves the reconstruction quality, it also reduces the anti-noise ability. When training, Gaussian noise with intensity σ = 0.05 is added to the measured value to improve the anti-noise ability against Gaussian noise. The 11 images in DR2-Net and the data set BSD500 are used as test images. The data set BSD500 has a total of 500 test images, and the image size is 321 × 481 or 481 × 321.The network is trained with the Pytorch open source framework. All experiments are performed on the Inter Core i7-7700 CPU, with the main frequency of 3.6 GHz, memory of 16 GB, and Graphics card Quadro M2000 platform.
IReconNet training is divided into two steps. First, the measurement network and the fully connected layer are trained, learning rate is 0.001. Each training is conducted 2×105 times, and the learning rate is reduced to 0.5 times of original rate, the training is conducted 1×106 times in total; The entire network is trained again, the learning rate is 10−4, a total of 120 training rounds, the learning rate per 40 rounds drops to 0.5 times of original rate. The USDCNN directly trains the entire network, the learning rate is 10−3, the training is carried out a 1×106 times in total, and the learning rate per 2×105 times is reduced to 0.5 times of original rate. Both IReconNet and USDCNN use the Adam method to train the network.
In this paper, the TVAL3,[26] NLR-CS,[27] D-AMP,[28] ReconNet, DR2-Net, and MSRNet algorithms are compared. The TVAL3, NLR-CS, and D-AMP are algorithms based on the iterative optimization, and the rest are based on deep learning algorithms. The experimental results are shown in the following Table
In Table
Figure
Structural similarity index (SSIM) is a measure of the similarity between two images. Here, the similarity refers to the resemblance in brightness, contrast and structure. The value of SSIM is between 0 and 1, and the closer to 1 the value, the more similar the two images are. After the MSRNet reconstructs the image through the network, a correction process is started to further improve the quality of the reconstruction. The PSNR and SSIM errors of the algorithm and MSRNet (reconstructed image corrected) are shown in Figs.
As can be seen from Fig.
The reconstruction time of traditional iterative algorithm is much longer than that based on deep learning,[30] so only the reconstruction time based on deep learning algorithm is compared here. To be fair, Table
IReconNet(Ff(⋅)) and USDCNN(Fmus(⋅)) represent the average reconstruction time of this algorithm. IReconNet(Fs(⋅) + Ff(⋅)) and USDCNN(Fs(⋅) + Fmus(⋅)) represent the average time of measurement network and reconstruction network. As can be seen from Table
In order to test the generalization ability of the algorithm on the big data set, the reconstruction performances of ReconNet, DR2-Net, and MSRNet are compared in the data set BSD500 (500 pictures). As shown in Fig.
The sampling rate MR = 0.25, 0.10 and the Gaussian noise of four different noise intensities (σ = 0.01, 0.05, 0.10, 0.25) are added to the measured values, respectively. As shown in Figs.
At the sampling rate MR = 0.25, 0.10, four different noise intensities (σ = 0.001, 0.01, 0.05, 0.10) of salt and pepper noise are added to the measured value. As shown in Figs.
Reconstructed images with four noise intensities added at a sampling rate of MR = 0.25 are shown in Fig.
As can be seen from Fig.
In this paper, we proposed two network models based on deep learning, IReconNet and USDCNN. The experimental results of 11 commonly used test images and datasets BSD500 show that in the IReconNet the random Gaussian measurement matrix is replaced with the measurement network, which greatly improves the reconstruction quality, and is better than those from the algorithms in other literature. In order to solve the problem that the measured values obtained by the measurement network still retain the spatial information about the image, the USDCNN introduces Bilinear and dilated convolution into the reconstruction network, which has better performance in reconstruction time, reconstruction quality, and robustness.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] |